-
-
Notifications
You must be signed in to change notification settings - Fork 18k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not cast .transform() output back to input dtype (closes #10972) #15256
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs perf (just run the asv for groupby) and see
Well, each time I run asv, I get different results, but after closing everything and leaving my computer alone, here is what I get:
|
those are not the transform asv |
I run Is there something else I should do for this PR? |
I think Appveyor is complaining about int being int32 on windows instead of int64. Is there a way to cast them to int64 instead? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change any tests; you have to maintain the existing unless their is a really good reason
tests breaking generally means something is wrong in your code
i suspect your changes are not propagating dtypes correctly
add the tests from the issue
@@ -1805,7 +1805,7 @@ def test_resample_median_bug_1688(self): | |||
datetime(2012, 1, 1, 0, 5, 0)], | |||
dtype=dtype) | |||
|
|||
result = df.resample("T").apply(lambda x: x.mean()) | |||
result = df.resample("T").mean() | |||
exp = df.asfreq('T') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why r u changing tests like this?
@@ -5669,7 +5668,7 @@ def test_ops_general(self): | |||
labels = np.random.randint(0, 50, size=1000).astype(float) | |||
|
|||
for op, targop in ops: | |||
result = getattr(df.groupby(labels), op)().astype(float) | |||
result = getattr(df.groupby(labels), op)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change tests
your changes are not robust
Codecov Report@@ Coverage Diff @@
## master #15256 +/- ##
=======================================
Coverage 86.33% 86.33%
=======================================
Files 139 139
Lines 51149 51149
=======================================
Hits 44157 44157
Misses 6992 6992
Continue to review full report at Codecov.
|
@jreback Hum, this is more complex than I thought The initial problem is: If I keep the output type as it is, then Am I missing something obvious? How to proceed from there? |
you need to selectively cast all lost every function should be cast except for things that produce ints (like size) |
So (That's how I stumbled on this issue when working on my PR #12607) |
@nbonnotte of course not, that's the point of this issue! you simply need to infer based on the return values, not automatically cast it. you might be able to simply stick it into a Series and it will work, BUT, you will still need to cast if its a numeric type (IOW a float) to the original type flavor. my points above is that you need to really look at the tests create code that DOESN't change anything except for specific tests (e.g. the test from the actual issue). |
Ok. I'm going to take it slow, starting with datetime input |
this was fixed by 251826f |
It looks so simple once you see the solution... Thanks @jreback for your feedback and your patience, sorry I couldn't get that through |
git diff upstream/master | flake8 --diff
Maybe need more tests, and check performance?